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Abstract 


This deliverable describes the data management plan, that is, the policy regulating collection, management, 
sharing, archiving, and preservation of data in the MOSAICrOWN project. 
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Executive Summary 


This deliverable describes the data management plan of the MOSAICrOWN project. It summa- 
rizes the policy regulating collection, management, sharing, archiving, and preservation of data 
in the project. Data regulated by the plan are all data that are either managed or are produced by 
MOSAICrOWN, e.g., documents and software tools. 

We recall that, as already noted at the beginning of the project (Deliverable D1.1 “POPD - 
Requirement No. 2”), MOSAICrOWN does not use or manage any real-life data. As a matter 
of fact, MOSAICrOWN strictly builds the technical framework and solutions for the digital data 
market, but neither collects nor uses any personal data either directly or indirectly. 

Therefore, data produced by MOSAICrOWN are technical solutions, in the form of documents 
or software tools. 

The MOSAICrOWN Consortium has been committed to timely and rapid distribution of the 
project’s results, making them widely available and openly accessible. MOSAICrOWN pursues 
an open-access policy, making results and publications publicly available. Data needed for the 
coordination of, and collaboration in, the project (such as work communications and progress 
reports on the technical work) have been restricted to the project participants. 

In this deliverable, we describe the data management plan that regulated the different kinds of 
data handled or produced by MOSAICrOWN. 


1. Data Management Plan 


Data produced by the project are technical solutions (in the form of documents or software tools). 
The project has not acquired or produced any specific data sets. Therefore, in the following we 
refer the data management plan to the different kinds of data managed or produced by the project 
distinguishing among the following kinds of data: 


1. data/information about the project, that is, all static data describing the project that are pro- 
duced for dissemination purposes (e.g., fact sheet, consortium, objectives, vision, planned 
work and its organization in work packages), as well as project progresses (e.g., news re- 
porting related events, dissemination and exploitation relevant information); 


2. data for the coordination of the project, that is, all information needed for the communi- 
cation and interaction among the partners working in the project, (e.g., name and contact 
information of people working in the project, meetings, communication among partners, 
and intermediary progresses); 


3. documents produced by the project, that is, scientific papers and deliverables/work docu- 
ments presenting scientific and technical solutions produced by MOSAICrOWN; 


4. software tools produced by the project, that is, object and source code of the software imple- 
menting the scientific and technical solutions of the project, together with their associated 
metadata (e.g., unique identifier, creator/s, and versions). 


There are three main servers where data on the project have been hosted. These have been 
dedicated to manage the following: 


e Web site of the project (https://mosaicrown.eu), with a public and a restricted area; 


e Project document repository SVN (https://mosaicrown.eu/svn/mosaicrown/), for the project’s 
internal working and communication; 


e Mailing lists (mosaicrown-...@di.unimi.it), for project coordination and communication. 


All servers reside within the premises of the project coordinator (UNIMI) and are managed by 
administrative personnel of UNIMI. They are backed up weekly. 
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Data about the project 


Data set description 


This kind of data refers to all the data describing the project that has to be made publicly avail- 
able (for dissemination purposes) such as: fact sheet, consortium, objectives, vision, planned 
work and its organization in work packages. In addition to these static data about the project, 
this kind of data includes information that has been continuously updated as the project pro- 
gressed, such as news reporting related events, dissemination and exploitation relevant infor- 
mation. 


Standards and metadata 
Data are organized with a hierarchical structure allowing easy retrieval and navigation. Content 


management is handled with the WordPress content management structure. 


Data sharing 


Data are publicly accessible via the project web site: https://mosaicrown.eu. 


Archiving and preservation 


Data are stored on a server residing within the premises of the project coordinator (UNIMI) 
and are managed by administrative personnel of UNIMI. The server is backed up weekly. They 
will be maintained at least five years after the life of the project. 

Example - MOSAICrOWN home page 


<= 
PER 
= he 


MOSAICrOWN 


Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and OWNer control 


Home The project ~ Results v News and events v Contact Restricted area v Reviewer area v 


Welcome to MOSAICrOWN 


Background and Motivation 


The application of data analysis techniques over large data collections provides great benefits, from the personal, to the business, research, and social 
domain. The availability of large data collections recording actions and choices of individuals and organisations can lead to great improvement in the 
understanding of how the world operates. The continuous evolution of ICT is enabling the realisation of such vision at a fast pace, supporting the 
realisation of architectures enabling collaborative data sharing and analytics. A clear obstacle towards the realisation of such potential and vision is 
represented by security and privacy concerns. Indeed, the (actual or perceived) loss of control over data and potential compromise of their 
confidentiality can have a strong detrimental impact on the realisation of an open framework for enabling the sharing of data from multiple independent 
data owners. 


The Vision of MOSAICrOWN 


MOSAICrOWN aims to enable data sharing and collaborative analytics in multi-owner scenarios in a privacy-preserving way, ensuring 
proper protection of private/sensitive/confidential information. MOSAICrOWN will provide effective and deployable solutions allowing 
data owners to maintain control on the data sharing process, enabling selective and sanitized disclosure providing for efficient and 
scalable privacy-aware collaborative computations. 


This goal will be achieved by providing: i) a data governance framework able to capture and combine the protection requirements that can be possibly 
specified by multiple parties, who have a say over the data, to empower them with more control over such data; ii) effective and efficient protection 
techniques that can be integrated in current technologies and that enforce protection while enabling efficient and scalable data sharing and 
processing. 


Data Market 
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Data for the project coordination and collaboration 


Data set description 


This kind of data refers to the information needed for the coordination and collaborative work 
of the project itself, such as name and contact information of people working in the project, 
meetings, communication among them, and intermediary progresses. As established by the 
Consortium Agreement (art 10.9), for people working on the project, only contact information 
has been acquired, and MOSAICrOWN has not made available any personal data to other 
parties or processed any personal data on behalf of other parties. 


Standards and metadata 
Contact information is organized in different mailing lists to allow easy retrieval and references. 


Work in progress and documents for collaborative working are organized in a hierarchical 
structure with nested directories and files with self-explanatory names, and are accessed via 
an SVN service. Mailing lists and SVN information are also accessible via the restricted area 
of the project web site using individual credentials (login/password) assigned to each project 
participants. 


Data sharing 


Data sharing is restricted to project participants. Access to the SVN service is available to all 
project participants and is regulated with control of login and (randomly generated) password 
distributed to each individual participant by the administrator of the server. Access to the 
different mailing lists of the project is regulated with control of login and (randomly generated) 
password distributed to each individual participant by the administrator of the server. 


Archiving and preservation 


Data are stored on a server sitting within the premises of the project coordinator (UNIMI) and 
are managed by administrative personnel of UNIMI. The server is backed up weekly. They 
will be maintained at least five years after the life of the project. 


Example — Main mailing lists 


mosaicrown@di.unimi.it 
mosaicrown-mb@di.unimi.it 
mosaicrown-tb@di.unimi.it 
mosaicrown-pubs@di.unimi.it 
mosaicrown-adm@di.unimi.it 
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Project results - Documents 


Data set description 
This kind of data refers to all documents produced by the project. Among them we distinguish: 


deliverables/work documents and scientific papers. 


Standards and metadata 
All documents are made available in PDF format. 


Deliverables and work documents are identified following an easy-to-use notation with three 
parts: a letter that determines the kind of document (D for Deliverable, W for Work docu- 
ment); the number of the work package that coordinates its production; and a serial number 
that uniquely identifies the document within that work package. 


All documents have .bibtex metadata, easing the search, which can be exported for easy refer- 
ence. These metadata include the ones prescribed in the Grant Agreement (art 29.2). 


Data sharing 


Intellectual Property Right (IPR) and dissemination of project results are regulated by the Con- 
sortium Agreement (art 8). IPR of results remains with the project’s party that generated them. 
Sharing and dissemination follow an open access policy. For sharing and dissemination, we 
distinguish work documents, deliverables, and scientific papers. 


Work documents, being internal deliverables produced for assessing the progress of work and 
for official communication of intermediate results among partners, are visible only to project’s 
participants. 


All deliverables (apart from those reporting financial or exploitation information) are classified 
as public (PU). Consequently, they are made accessible to the general public following their 
submission to EC. This dissemination and sharing is made via the web site of the project 
(https://mosaicrown.eu, link “Results/Deliverables”). 


MOSAICrOWN embraces an open access policy, and also all scientific papers are made pub- 
licly available via the project website. Before undergoing public release, paper publication 
undergoes an internal process within the MOSAICrOWN Consortium, regulated by the Grant 
Agreement (art. 29) and Consortium Agreement (art. 8.3) of the project. In particular, “prior 
notice of any planned publication shall be given to concerned Parties at least 21 calendar days 
before the publication. Any objection to the planned publication shall be made in accordance 
with the Grant Agreement in writing to the Coordinator and to the Party or Parties propos- 
ing the dissemination within 15 calendar days after receipt of the notice. If no objection 
is made within the time limit stated above, the publication is permitted.” At this point, the 
paper is made publicly available via the project public web site (https://mosaicrown.eu, link 
“Results/Publications”). 
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Before papers are accepted as publication at scientific venue, this sharing and dissemination 
is made possible via the partners’ institutional archives and other open archives (e.g., arXiv). 
For papers accepted for formal publication by publishers, this public sharing and dissemination 
access is made possible via “green” open-access publication (i.e., “self-archiving”), which is in 
line with the copyright policy of major institutions and associations of the most selective and 
recognized conferences and journals. Within this policy, the publishers allow authors to post 
the final versions of accepted papers in their personal web site, the web site of their employers, 
or selected pre-authorized institutional web sites. Project publications are then hosted (and 
publicly accessible) from the partners’ website or open archives and linked from the web site 
of the project, thus ensuring broad visibility and easy access. 


Archiving and preservation 


Deliverables are stored on a server sitting within the premises of the project coordinator 
(UNIMI) and are managed by administrative personnel of UNIMI. The server is backed up 
weekly. They will be maintained at least five years after the life of the project. 


Scientific papers are linked from the project web site and are stored on partners institutional 
archives (also periodically backed up) and other open archives (like arXiv), which provides 
guarantees of continuity of service preservation of access. 

Example — Web page presenting deliverables 


MOSAICrOWN 


Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and OWNer control 


Home The project ~ Results + News and events v Contact Restricted area Reviewer area 


Deliverables 


D1.1 POPD - Requirement No.2 (M6) 
Ethics report. 


D7.1 Data Management Plan (M6) [pdf] [bib] 
It provides a detailed description of the data management plan. 


D2.1 Requirements from the Use Cases (M12) [pdf] [bib] 
It provides the final version of the data protection requirements coming from the Use Cases. 


D3.1 First version of the reference metadata model (M15) [pdf] [bib] 
It provides a preliminary version of the reference metadata model. 


D3.2 Preliminary version of tools for the governance framework (M17) [pdf] [bib] 
It provides a preliminary version of the tools developed for the definition of the data governance framework for the management of data in 
collaborative platforms. 


D4.1 First version of encryption-based protection tools (M17) [pdf] [bib] 
It provides a first version of the tools enforcing encryption-based data protection techniques. 


D5.1 First version of data sanitisation tools (M17) [pdf] [bib] 
It provides a first version of the tools developing data sanitisation techniques. 


D2.2 Report on requirements, research alignment and deployment plan (M18) [pdf] [bib] 
It reports on the status of about the alignment between the research and technological development in WPs2-4 and the requirements of the 
Use Cases. 


D3.3 First version of policy specification language and model (M18) [pdf] [bib] 
It provides a preliminary version of the model and language for policy specification. 


D4.2 Report on encryption-based techniques and policy enforcement (M18) [pdf] [bib] 
It provides a first version of the techniques for policy enforcement, based on the basic protection techniques developed in W3.1. 


D5.2 First report on privacy metrics and data sanitisation (M18) [pdf] [bib] 
It illustrates the metrics designed for measuring privacy and the techniques developed for data sanitisation. 


D6.1 First report on dissemination, communication, and exploitation (M18) 
It provides a description of the activities performed to disseminate and exploit the project's results in the first reporting period. 


D4.3 Final encryption-based techniques (M24) [pdf] [bib] 
It provides the final version of encryption-based data protection techniques and tools. 
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Project results - Software tools 


Data set description 


This kind of data refers to all software tools produced by the project. All software tools devel- 
oped within the project have also corresponding deliverables or work documents. 


Standards and metadata 


Code of software tools produced by the project are written using commonly-used programming 
languages (e.g., Java, C++, Python) or shell scripts. Every tool has structured metadata asso- 
ciated, specifying information such as unique identifier, creator/s, and versions, allowing for 
easy reference and retrieval. 


Data sharing 


IPR and dissemination of project results and dissemination of project results are regulated by 
the Consortium Agreement (art 8). IPR of results remain with the project’s party that generated 
them. 

As for sharing, software tools developed in the project, corresponding to work documents and 
deliverables, subject to the applicable IPR clauses, are classified as public (PU). 

Object code of tools developed by academic partners, subject to the applicable IPR clauses, is 
accessible to the general public. Public dissemination and sharing is made via the web site of 
the project (https://mosaicrown.eu). 

Source code of software tools developed by academic partners is provided as building blocks 
under an open source license and as applicable under the relevant IPR agreement. 


Archiving and preservation 

Object and source code of tools produced within MOSAICrOWN is stored and managed by 
the software creator. Object code is also stored for dissemination in public repositories (e.g., 
GitHub). 


Example - Pointer to object code 


e https://mosaicrown.eu/ 


n- MOSAICrOWN 
a Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and OWNer control 


m Overview Œ Repositories 7 @ Packages A People 4 [B Projects 


Popular repositories 


mondrian Public query-opt Public 
Spark-based Mondrian implementation Secure query distribution and cost optimization 

O Python 175 1 OJava 172 

aesmix Public policy-engine Public 
aesmix library for encrypting/decrypting data with the Mix&Slice AONT Tool enforcing the MOSAICrOWN policy language 

Oc vw O python %3 

freya-fs Public ns Public 


Forked from micheleberetta98/freya-fs 


@ Python 
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2. Conclusions 


This deliverable provided a description of the Data Management Plan (DPM) for managing the 
data generated and collected during the project. Specifically, the DMP described the data manage- 
ment life cycle for all datasets produced by the project. It covered: 


e data/information about the project; 
e data for the working of the project itself; 
e documents (scientific papers, and deliverables/work documents) produced by the project; 


e software tools produced by the project. 


For each dataset, this deliverable included a description and information about the methodol- 
ogy and standards applied, whether the dataset is shared/made open and how, and how the dataset 
is preserved. 
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